Novel Probabilistic Finite-State Transducers for Cognate and Transliteration Modeling
نویسنده
چکیده
We present and empirically compare a range of novel probabilistic finite-state transducer (PFST) models targeted at two major natural language string transduction tasks, transliteration selection and cognate translation selection. Evaluation is performed on 10 distinct language pair data sets, and in each case novel models consistently and substantially outperform a well-established standard reference algorithm.
منابع مشابه
Automata for Transliteration and Machine Translation
Automata theory, transliteration, and machine translation (MT) have an interesting and intertwined history. Finite-state string automata theory became a powerful tool for speech and language after the introduction of the AT&T’s FSM software. For example, string transducers can convert between word sequences and phoneme sequences, or between phoneme sequences and acoustic sequences; furthermore,...
متن کاملKonkanverter - A Finite State Transducer based Statistical Machine Transliteration Engine for Konkani Language
We have developed a finite state transducer based transliteration engine called Konkanverter that performs statistical machine transliteration between three different scripts used to write the Konkani language. The statistical machine transliteration system consists of cascading finite state transducers combining both rule-based and statistical approaches. Based on the limited availability of p...
متن کاملHindi Urdu Machine Transliteration using Finite-State Transducers
Finite-state Transducers (FST) can be very efficient to implement inter-dialectal transliteration. We illustrate this on the Hindi and Urdu language pair. FSTs can also be used for translation between surface-close languages. We introduce UIT (universal intermediate transcription) for the same pair on the basis of their common phonetic repository in such a way that it can be extended to other l...
متن کاملTransliterated Mobile Keyboard Input via Weighted Finite-State Transducers
We present an extension to a mobile keyboard input decoder based on finite-state transducers that provides general transliteration support, and demonstrate its use for input of South Asian languages using a QWERTY keyboard. On-device keyboard decoders must operate under strict latency and memory constraints, and we present several transducer optimizations that allow for high accuracy decoding u...
متن کاملDiscriminative Methods for Transliteration
We present two discriminative methods for name transliteration. The methods correspond to local and global modeling approaches in modeling structured output spaces. Both methods do not require alignment of names in different languages – their features are computed directly from the names themselves. We perform an experimental evaluation of the methods for name transliteration from three languag...
متن کامل